The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue

نویسندگان

  • Sasha Calhoun
  • Jean Carletta
  • Jason M. Brenier
  • Neil Mayo
  • Daniel Jurafsky
  • Mark Steedman
  • David Beaver
چکیده

This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al., 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al., 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Using the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: a Case Study

The NITE XML Toolkit (NXT) provides library support for working with multimodal language corpora. We describe our experiences in using it to study discourse effects on syntactic choice using the parsed Switchboard Corpus as a starting point, as a case study for others who may wish to adopt similar techniques using NXT or one of the other libraries that are beginning to emerge. We discuss conver...

متن کامل

Perceiving surprise on cue words: prosody and semantics interact on right and really

Cue words in dialogue have different interpretations depending context and prosody. This paper presents a corpus study and perception experiment investigating when prosody causes right and really to be perceived as questioning or expressing surprise. Pitch range is found to be the best cue for surprise. This extends to the question rating for really but not for right. In fact, prosody appears t...

متن کامل

ADAM: The SI-TAL Corpus of Annotated Dialogues

In this paper we describe the methodological assumptions, general architectural framework and annotation and encoding practices underlying the ADAM Corpus, which has been developed as part of the Italian national project SI-TAL. Each of the 450 dialogues is represented by an orthographic transcription and is annotated at five levels of linguistic information, namely prosody, pos tagging, syntax...

متن کامل

Dissertation Proposal Dialogue Glue: Cue Words and Prosody

This dissertation is about how semantics, pragmatics and speech prosody conspire to glue a dialogue together. This proposal focuses on the meaning and use of cue words. This set of discourse markers include backchannels like uh-huh and okay , agreements like right , and questioning particles like really . Understanding these sorts of markers is important because they indicate both when things a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Language Resources and Evaluation

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2010